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The In-Basket Test' 


NORMAN FREDERIKSEN, D. R. SAUNDERS, AND BARBARA WAND 
Educational Testing Service 


I. INTRODUCTION 


HE Officer Education Research Labo- 
Air Force Personnel and 
Training Research Center, has concerned 
itself for some time with the problems of 
evaluation of the effects of instruction in 
Air University. In connection with this 
work, Educational Testing Service was 
asked to study the desired outcomes of 
training in the Command and Staff 
School (a part of the Air University, 
formerly known as the Field Officer 
Course) and to develop prototype meth- 
ods to determine how well these objec- 
tives are being achieved. This report de- 
scribes an attempt to develop such pro- 
totype methods. Although the research 
was directed toward evaluation of the 
curriculum at the Command and Staff 


* This research was supported in whole or in 
part by the United States Air Force under Con- 
tract No. 33(600)-5833, monitored by Officer Edu- 
cation Research Laboratory, Air Force Personnel 
and Training Research Center, Maxwell Air 
Force Base, Alabama. 

The project was conceived by the Human Re- 
sources Research Institute when it was under the 
command of Major General Carroll. This organi- 
zation has since become the Officer Education Re- 
search Laboratory and a unit of the Air Force 
Personnel and Training Research Laboratory, 
Air Research and Development Command. Dr. 
Samuel M. Goodman and his successor, Dr. Don- 
ald J. Malcolm, have acted as contract monitors; 
they and their staff have been very helpful on 
numerous occasions. The project could not have 
been conducted without the active collaboration 
of Colonel Walker and Colonel Adams, successive 
Commanders of the Command and Staff School, 
ACSC, Air University, and their Curriculum 
Planning Board. Lieutenant Colonel Sheeks and 


School and not toward assessment of in- 
dividual officers, the discussion has dis- 
tinct implications for the development of 
instruments to evaluate individual per- 
formance. 

The problems of assessment in areas 
demanding a high level of performance 
present a challenge to those interested in 
new techniques of measurement. There 
is a clear need for instruments which will 
measure such complex skills as the ability 
to organize discrete pieces of informa- 
tion, to discover the problems implicit in 
a situation, to anticipate events which 
may arise because of such problems, and 
to arrive at decisions based on a large 
number of considerations. These and 
other skills are continually demanded of 
administrative officers in key positions. 

At this level of functioning, tests of in- 


Lieutenant Colonel Wall, members of this Board, 
served effectively as our liaison with CSS. Colonel 
Ritchey, Director of General Courses, acted as 
liaison between ACSC and ETS for the first phase 
of the project. Dr. T. F. Staton, Director of Edu- 
cational Assistance in ACSC, has been particu- 
larly helpful in the course of the study. 
Acknowledgment is due a number of ETS 
staff members who worked on the project at 
various times and in various capacities. During 
the first phase of the project Dr. Warren Findley 
was Principal Investigator and Dr. Paul Diede- 
rich, Dr. Paul Freeman, Dr. Harold Gulliksen, 
Dr. William G, Mollenkopf, and Mr. Charles 
Allen contributed in various ways to the study. 
Dr, Gulliksen and Dr. Mollenkopf aided in the 
development of the in-basket problems. Miss 
Henrietta Gallagher provided technical help in 
the analysis of the data and Mrs. Marjorie 
Tulinski provided assistance in scoring. Dr. 


, Irving Lorge served as a consultant to the project. 
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tellectual ability bear a lower relation to 
performance than they do to performance 
on tasks of a less complex nature, partly 
because selection on the basis of intelli- 
gence has already taken place, and partly 
because administrative responsibilities 
appear to demand additional skills. 

Several earlier attempts to devise situational 
tests which would evaluate performance at this 
level have been reported in the literature. For 
instance, use has been made in Great Britain of 
such tests in selecting men for responsible posts 
in the Civil Service, for selecting industrial exec- 
utives (2, 3), and by the War Office Selection 
Board for evaluating officer potential (4). In 
Great Britain, however, the tendency has been to 
use group problem-solving techniques more fre- 
quently than written tests. In Australia, Lafitte 
(6, p. 107) reports the development of a written 
device which, in its conception, is not unlike the 
test described in this monograph. 

Considerable imagination and versatility have 
been demonstrated in the work on these new 
techniques. At the same time, the proponents (7) 
of these innovations have frequently expressed 
their scepticism of the value of objective, scora- 
ble, psychological tests. Thus, it is often the case 
that in attempting to solve the problems of eval- 
uation of performance in high-level jobs the 
test-maker is considered to be facing the choice 
of constructing objective, reliably scored, but 
relatively insensitive instruments, or developing 
a more sensitive measure which resists attempts 
to use it reliably and objectively. 


The instrument which is described in 
this report is the result of an attempt to 
devise a sensitive measure which may at 
the same time be objectively and reliably 
scored, and proceeds from a faith that 
progress toward both goals of sensitivity 
and of objectivity may be made in one 
operation. 

This instrument, which has been called 
the In-Basket Test, is a situational test 
presented in written form and group ad- 
ministered. The briefing on the nature of 
the problems and the presentation of the 
problems are carried out in such a way 
that the information available to the can- 
didate is the same for all candidates. The 
test allows a great freedom in response. 
The problems are presented in such a 


way that it is up to the candidate first to 
discover the problem and only then to 
organize an attack. Although the In- 
Basket Test was designed to represent the 
situation faced by the Field Officer in the 
Air Force, material suitable to other 
areas of experience may readily be 
adapted to this form. 

It is hoped that the description of the 
steps taken in developing this instrument 
and of the problems encountered along 
the way, as well as the recommendations 
for further improvement of the test, may 
be helpful to those who are working on 
instruments of this kind. 

The first phase of the research involved 
a careful study of the curriculum and of 
the objectives of the course in order to 
determine quite specifically what aspects 
of their work the officers were expected to 
perform more effectively as a conse- 
quence of the training.? Students in the 
Command and Staff School (CSS) are 


mostly of the rank of major and lieu- 
tenant colonel and have been specially 
selected for training to fit them for 
greater administrative responsibilities as 
field grade officers. This training includes 
courses with such titles as Organization, 


Management, Personnel, Intelligence, 
Operations, and Logistics. The main 
source of information about the objec- 
tives of the instruction was a series of 
interviews with instructors and school 
officials. The instructors were asked to 
state what on-the-job activities they 
would expect graduates to handle more 
expertly as a consequence of the instruc- 
tion. An attempt was made to get the 
instructors to avoid generalities and to 
describe observable behaviors which 
would indicate whether or not a student 
had attained the desired objective. 


* This phase of the study was carried out under 
the direction of Dr. Warren G. Findley /1). 
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Altogether more than 500 such state- 
ments were collected. In order to put 
this large number of behavioral descrip- 
tions of desired outcomes into a more 
orderly system, the next step was to 
classify the statements into categories 
which would correspond to psychologi- 
cally meaningful functions. 

Twelve categories make up the system 
of classification which was evolved. In 
six of these categories behaviors are pri- 
marily individual; that is, they are be- 
haviors which could be exhibited by a 
person all alone. The other six categories 
involve behaviors that are primarily in- 
teractive; that is, they involve a relation- 
ship with other people. Four of the in- 
dividual categories were selected as the 
primary focus of the evaluation instru- 
ment to be developed. 

The categories may best be defined by 
first giving some examples of statements 
and then the term that has been chosen 
to represent the statements in that cate- 
gory. 

Below are some selected statements of 
objectives which fall into one of the 
categories: 


Carefully follows established supply procedure 
under normal circumstances. 


Deals only with the civilian personnel officer in 
all matters of civilian employment. 


Refrains from making inappropriate requests of 
units on an Air Force base. 


These statements assert that the gradu- 
ate of the Command and Staff School is 
likely to make efficient use of routines, 
employing actions appropriate to the 
scope of his assignment, and using chan- 
nels and “SOP” (Standard Operating 
Procedure) to advantage. 

Here are some examples of another 
type of behavior: 


Readily plans for and makes changes in pro- 


cedures which are consequences of the introduc- 
tion of new research developments. 


Changes the organizational pattern of a base 
without hesitation when it seems desirable. 


Uses the B-29 bomber not only for high-level 
bombing, but also for tactical support at low 
level, for delivering napalm bombs, and for other 
unconventional uses. 

These statements are alike in demand- 
ing flexibility, adaptability, and willing- 
ness to introduce change. 

Here are some additional examples: 
Can comprehend the effect on AF activities of 


possible political-economic events such as the 
closing of highways and railways into Berlin. 


Anticipates and makes projections of his plans 
many months in advance. 


Prepares alternative operational plans, based on 
different contingencies. 

These statements assert that the grad- 
uate of the Command and Staff School is 
relatively likely to show foresight, that 
is, to anticipate the feel of future situa- 
tions. This involves anticipating possible 
as well as probable consequences and pro- 
viding for contingencies. 

Here are some statements which fall 
into still another category: 

In selecting weapons to achieve a stated objec- 
tive, considers the various available weapons first 


from the viewpoint of efficiency, and second from 
the viewpoint of economy. 


In deciding upon or advising concerning a partic- 
ular decision to be made, includes cost as an 
important factor. 


Plans for making appropriate use of various 
weather conditions in combat operations. 

These statements assert that the grad- 
uate of the Command and Staff School is 
likely to evaluate data effectively. This 
will involve judgment as to what data to 
include as pertinent and what to exclude 
as irrelevant to a solution. 

All four of these categories of behavior 
are individual behaviors. The other two 
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categories of individual behaviors are 
knowledge and effective guidance of the 
decision-making function of the unit. 
Since the interactive behaviors were 
expected to require relatively unwieldy 
evaluation methods, priority was given to 
developing measures of the individual 
types of behavior. Also, some of the cur- 
riculum divisions, such as Communica- 
tions and Electronics, and Judge Advo- 
cate General, were thought not to justify 
immediate or extensive evaluation efforts, 
in view of the small proportion of time 
devoted to them in the instruction. Thus, 
four curriculum divisions—namely, Man- 
agement, Personnel, Operations, and 
Logistics—and four major types of in- 
dividual outcomes—namely, efficient use 
of routines, flexibility, foresight, and ef- 
fective evaluation of data—were selected 
to receive the main emphasis in the de- 
velopment of evaluation methods. 


II. GENERAL DESCRIPTION OF THE 
IN-BASKET TEST 


Purposes of the Evaluation Instruments 


A measurement device was planned 
which would provide a separate score for 
each of the four selected functional cate- 
gories of behavior, and which would sup- 
port curriculum development by yielding 
information pertinent to the related cur- 
riculum divisions. Thus, the In-Basket 
Test resulted from an attempt to develop 
measuring instruments which would per- 
mit an evaluation of the extent to which 
students profited from the aspects of the 
instruction which aimed at improving 
their ability to use Standard Operating 
Procedure (SOP), increasing their flexi- 
bility, improving their foresight, and in- 
creasing their ability to evaluate data 
effectively. The intent was not merely to 
discover if the students had mastered 
textbook knowledge about flexibility, for 
instance, but rather to find out whether 


or not the graduates of the course ex- 
hibited in their own behavior on the job 
the characteristics which were being 
sought. From one point of view, a cri- 
terion measure was being developed as a 
means for evaluating the effectiveness of 
Air Force administrative officers. 


Description of the In-Basket Materials 


The materials which have been de- 
veloped are called collectively the In- 
Basket Test. A large amount of the daily 
work of an administrator centers around 
the contents of his in-basket. The In- 
Basket Test consists in putting a candi- 
date into a realistic situation which calls 
upon him to deal appropriately with 
such material as an Air Force officer 
might find in his in-basket. 

The form of the test which was tried 
out at Maxwell Air Force Base involved 
eight hours of testing, during which each 
candidate was required to play four roles 
in succession. In one two-hour period the 
candidate played the role of a Command- 
ing Officer of a hypothetical Composition 
Wing, in another the Wing’s Director of 
Materiel (D/M), and in other periods 
Director of Personnel (D/P) and Director 
of Operations (D/O). In each of these 
roles he was given an in-basket contain- 
ing incoming letters, memoranda, staff 
studies, letters prepared for his signature 
by subordinates, and other similar ma- 
terial. He was given suitable forms on 
which to write answers, and his directions 
were to go to work as though he were 
actually on the job. 

A common, and often valid, objection 
to situational tests is that the only reason- 
able response to a situation is, “It de- 
pends.” Examinees feel that it is unfair 
to be presented with a complex situation 
in four lines and then be expected to de- 
scribe the best action. An attempt was 
made to overcome this difficulty in the 
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case of the In-Basket Test by presenting 
a sufficient amount of information to 
make it unreasonable to say that the 
only correct answer is “It depends.” This 
was done in two ways. First, a consider- 
able amount of background information 
about the hypothetical composite wing 
was provided for study by the candidates 
prior to the time when they actually took 
the test. This information included such 
items as a statement of the mission of the 
wing, a short history of the wing, an or- 
ganization chart, a table showing the 
strength of the wing, a roster of the key 
officers of the wing, maps showing the lo- 
cation of Pine City Air Force Base, and 
maps indicating the landing strips and 
buildings of the Base. Second, the situa- 
tion was so arranged that an adjutant 
supposedly had studied the contents of 
the in-basket and had added from the 
files appropriate documents to explain 
the letters or memoranda and to provide 
needed background information. In mak- 
ing a decision in a real-life situation one 
never has all the relevant information. In 
this respect the test situation was prob- 
ably not greatly different from a situation 
in real life. 

The development of the problems 
which became parts of the In-Basket Test 
was accomplished through close coopera- 
tion with officers of the Air Force, and 
involved writing letters, memoranda, and 
staff studies which were like those which 
actually might be found in the in-basket 
of an Air Force officer. This phase of the 
work is described in greater detail in a 
later section. 

When the candidate appeared to take 
the test, he had already been given an 
opportunity to study the background ma- 
terials. His instructions were roughly as 
follows: 


Today you are asked to take the role of Direc- 
tor of Personnel of the 71st Composite Wing. 


The previous Director of Personnel, Lt. Col. 
Hart, was killed in an auto accident, and you 
have been assigned to take his place. A manila 
envelope on your desk contains the materials 
which have collected in Hart’s in-basket together 
with additional material placed there by the Ad- 
jutant for your guidance. Your job is to read 
your mail and take appropriate action as though 
you are actually on the job. Write the appropri- 
ate notes, memos, letters, or directives. Take as 
much action as you can with the information 
available to you. You are limited in that no more 
information can be obtained during the next few 
hours and you can communicate only in writing. 
In his Director of Personnel in-basket, 
the candidate found such items as (a) a 
letter for the policy file from Col. Good- 
fellow, the Wing Commander, which 
states that it will continue to be the 
policy of the wing to pay constant atten- 
tion to the problem of morale; (b) a 
letter from the Personnel Officer regard- 
ing the conduct of Airman Third Class 
Joe Doakes, who is a personnel problem, 
asking that Doakes be given a talking-to; 
(c) a note from Col. Goodfellow request- 
ing that an appropriate policy statement 
with respect to hardship and bad conduct 
discharges be drafted; this note is backed 
up by a memorandum from the Legal 
Officer describing how punishment had 
been escaped by an airman through the 
hardship discharge, and another docu- 
ment showing that the former Director 
of Personnel has concurred in the Legal 
Officer's recommendation; and (d) a 
memorandum from the Civilian Person- 
nel Officer regarding a visit by a delega- 
tion of civilian employees who have a 
grievance. These are only a few of the 
problems found in the in-basket. 
There is no one-to-one relationship be- 
tween problems and documents pre- 
sented to the candidate. Some of the doc- 
uments are included merely as statements 
of policy—as additional background in- 
formation. Sometimes a problem is rep- 
resented by a combination of two or 
three documents which are not physically 
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together. In other instances the real prob- 
lem may not be the one stated by the 
writer of a memorandum, but rather one 
which is implicit in what is stated. 


Ill. How THe In-Basket TEst 
Was DEVELOPED 
Development of the Setting 

Logically, the first step was to develop 
some concept of a basic situation or situ- 
ations in which the problems would be 
set. After some of the problems had been 
substantially prepared, it was decided to 
place all of the problems in the setting of 
an imaginary “71st Composite Wing.” 
Since all four roles are placed in the same 
imaginary wing, a minimum of confusion 
was created for the subjects who are re- 
quired to play each role in turn; also, an 
opportunity was created for the exami- 
nees to carry over information learned in 
one role to subsequent roles. By placing 
the imaginary wing in Maine, an oppor- 
unity was created to use problems de- 
pendent on adverse weather conditions 
and, by making the wing a “composite” 
wing, opportunity was created to use 
problems involving several types of air- 
craft. 

The actual process of developing the 
situation went hand in hand with the 
development of specific problems that 
would fit the situation, until a consider- 
able body of factual detail about the 71st 
Composite Wing was built up. Forms 
were developed, tables of strength and 
organization were drawn up, statements 
of mission and policy and records of per- 
formance were created, all in order to 
provide the background information 
needed to set and solve the problems. It 
is believed that the background informa- 
tion provided was broad enough to make 
it reasonable for candidates to take ac- 
tion on a majority of the problems. 


Use of Essays in Constructing Problems 


The first approach that was systemati- 
cally used to create problems for the In- 
Basket Test involved study of a series of 
essays written by the students in the CSS 
soon after they arrived at the school. Stu- 
dents were asked to describe some prob- 
lem in the Air Force—how it arose, the 
factors bearing on its solution, what had 
been done to solve it, and what remained 
to be done to complete its solution. The 
essay was written as if to brief a successor 
to the job when there was no opportunity 
for personal briefing. Approximately half 
of the materials developed for the July 
1953 tryout of the in-basket procedure 
were derived from ideas in these essays. 


Use of Interviews in Constructing 
Problems 


It was also considered desirable to tap 
some source of problems remote from Air 
University in order to eliminate any pos- 
sible bias toward, or away from, prob- 
lems familiar to students in the course. 
Accordingly, arrangements were made to 
visit McGuire Air Force Base, New 
Jersey, for the purpose of developing ad- 
ditional problems for the test. Interviews 
were held with about a dozen of the 
officers in the Headquarters of the Air 
Defense Wing and in the Groups di- 
rectly under this wing located at Mc- 
Guire Air Force Base. The aim of the 
interview was always to isolate problems 
which looked as if they would meet the 
requirements of the in-basket situation. 
An attempt was made to block out each 
problem in sufficient detail so that actual 
writing of the necessary materials could 
proceed. This included a discussion of 
the general nature of each memorandum, 
letter, or other document that it would 
be necessary to prepare. 
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Steps in Problem Preparation 


The following steps typically were ap- 
plied in preparing a problem for the In- 
Basket Test from materials obtained 
from the sources discussed above. 


1. A suitable point was selected in the devel- 
opment of the problem at which its presen- 
tation would require a minimum amount of 
reading. 

. It was determined whether or not the prob- 
lem could be related to one of the four roles 
in the 71st Composite Wing (Commander, 
D/M, D/O, or D/P). 

. A method was determined for presenting the 
problem in this role. This method some- 
times involved dividing the problem up into 
several memoranda, so that the problem 
would be found only by noticing the rela- 
tionships between the memoranda. Or in 
other cases it was handled by preparing a 
poor solution for presentation along with 
the problem, since correspondence that is 
passed forward without recommendations is 
properly returned as “incomplete staff ac- 
tion.” 

. Materials that were necessary to present the 
problem were written, An effort was made 
to ensure that these materials were stated 
briefly and clearly, so that the understand- 
ing of the problem would not depend too 
much on reading comprehension. Indica- 
tions were given that the materials had been 
seen by everyone who should have seen 
them before they got into the in-basket 
(unless the purpose of the problem was to 
test the student’s recognition of incomplete 
coordination). 

. Other supporting documents were written 
that might be needed to complete the pic- 
ture of the problem or to provide informa- 
tion that would be needed for the intended 
solution, 

. The problem materials were then reviewed 
by someone familiar with Air Force termi- 
nology, to ensure that the language was cor- 
rect and unambiguous and that the ma- 
terials did not unintentionally deviate from 
normal command and staff procedures. 

. Several persons were asked, as an informal 
uyont, to indicate the action they would 
take on the problem; this helped to ensure 
that most subjects would not be sidetracked 
and that the differences in performance 
would be of the intended kind, in response 
to the intended problem. 


Development of the Procedure for Ad- 
ministration of the Test 


In the main tryout of the test in July 
1953, the following steps were taken in 
introducing the testing situation and in 
attempting to build up motivation: 


1. The proctor reads a statement out- 
lining the ground rules of the test situa- 
tion and “explaining” how the subject 
had gotten into it. For example, the fol- 
lowing instructions were for the role of 
D/P (which was not the first role played). 


a. Today you are asked to take the role of the 
Director of Personnel of the 71st Composite 
Wing, Pine City AFB, Maine. The previous 
D/P, Lt. Col. Charles Z. Hart, was killed 
four days ago in an auto accident, and you 
have been assigned to Pine City AFB to take 
his place. Today’s date is Tuesday, 6 July 
1954. The manila envelope on your desk 
contains the materials which have collected 
in Hart’s in-basket, plus additional material 
placed there by the Adjutant for your 
guidance. ... 

- Your job is similar to the previous ones. 
You are merely to read your mail and take 
appropriate action as though you were ac- 
tually on the job. Write the appropriate 
notes, memos, letters, or directives. Take as 
much action as you can with the informa- 
tion which is available to you. As before, 
you are limited in that (1) no more infor- 
mation can be obtained during the next two 
hours, and (2) you can communicate only in 
writing. 

. Please write legibly. Place the appropriate 
file nuruber on each sheet, and sign your 
own name. I have additional copies of Form 
71W3? if you need them. 

. When you have finished, place all of the 
materials back into the envelope, including 
the materials you have written. Place a 
paper clip on all the papers you have 
written on in order to separate them from 
papers you have not written on. Leave any 
unused Forms 7:W3 outside of your enve- 


* The subjects were asked to use Form 71W3 
in writing all their messages, This is a form, de- 
signed for use in the test, which is similar to 
“buck slip” forms in use in the Air Force, but 
not identical to any (see Fig. 1). 
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lope. Write your name and course code 
number on the outside of the envelope. 

. As you know, the findings of this test will 
have no bearing on your grades, but may be 
very helpful in the development of evalua- 
tion methods. Therefore, you are requested 
to cooperate by doing your best. 

f. Go ahead. 


Development of the Scoring Procedure 


Once the problems and a procedure 
for their administration had been de- 
veloped, the major remaining problem 
was the development of an adequate sys- 
tem for scoring the test on the various 
categories of functional behavior. 


Having administered the series of problems in 
the tryout, samples of answers were drawn for 
each problem recognized for scoring purposes. 
Three samples were drawn, which will be re- 
ferred to as Samples I, II, and III. Sample I 
consisted of the responses of one student ran- 
domly drawn from each of the gg cubicle-groups 
used in administering the In-Basket Test. Sample 
II was similarly drawn from the cases remaining 
after constructing Sample I. Sample III was 
drawn from the cases remaining after construct- 
ing Sample II, by randomly selecting three offi- 
cers from each cubicle. It was considered desira- 
ble to stratify the samples on the basis of their 
cubicle assignment in order to eliminate any 
possible effects of communication between sub- 
jects during the course of the testing. It is known 
that there was some communication in some of 
the cubicles, but it is believed that this was 
largely confined to procedural matters of the test 
administration. 

The responses of the students to each problem 
in Sample I were sorted into categories represent- 
ing the various types of response to the problem. 
For example, concurring in a recommendation 
might be one type of response (although the 
concurrence might be indicated in a great variety 
of ways); refusing to concur might be another 
type; and referring the problem to higher author- 
ity might be a third type of response. Such cate- 
gories were further combined in an effort to 
arrive at a relatively small number of types of 
response that were clearly distinct from one 
another and easily definable. Between five and 
fifteen types of response were identified for each 
of the various problems, based on a study of the 
responses in Sample I. 

In setting up these types of response it was 
necessary to ignore a great deal of information 
and to abstract from each response those aspects 
of the response which were relevant to the func- 
tional category of behavior (e.g., foresight or 


flexibility) which the problem was supposed to 
measure. Even striking differences from student 
to student in style or tone of the communication 
were ignored in setting up the types of response. 
This abstracting of a limited aspect of the be- 
havior necessarily meant loss of a great deal of 
information but presumably had the advantage 
of making the scores reflect only the variable 
being measured. 

The types-of-response lists, obtained from study 
of Sample I responses, were tested by using them 
to classify the responses from Sample II. It was 
found necessary, in order to make the system 
work for Sample II, to redefine or broaden the 
concepts of some of the types of response. On 
the whole, the types of response obtained from 
the first sample were found to be adequate. 

On the basis of the study of Sample I and II 
responses, some problems were found to be un- 
suitable because of too great a uniformity in the 
answers or because of too great a scatter in types 
of response. Such problems were eliminated, so 
far as further scoring is concerned. In other in- 
stances it was found advisable to score the prob- 
lem for a different functional category of be- 
havior than it was originally designed to measure. 

Following the study of Samples I and II, final 
lists of types of response were prepared for each 
remaining problem. These were submitted to 
two panels of expert judges in the Air Force. 
One panel consisted of twelve students in the 
Air War College (AWC). The other panel con- 
sisted of twelve staff members from Air Com- 
mand and Staff College (AC & SC) (of which CSS 
is a part). Each panel was chosen to include three 
officers particularly qualified to judge the prob- 
lem materials in each of the four roles of the 
In-Basket Test. 

Each subpanel of three officers was asked to 
consider the problems in the in-basket for the 
appropriate role and to assign scoring values to 
each type of response listed for each problem 
being scored. The subpanels from AWC worked 
independently from the subpanels from AC & 
SC, but each subpanel worked as a team. 

The judges were first asked to consider the in- 
basket as a whole, and to assign a rank-order of 
importance to the problems being scored. The 
level of priority given to each problem by the 
judges was later used in determining the scoring 
weight assigned for failing to respond to the 
problem. The agreement between the rankings 
of two subpanels of judges was invariably almost 
perfect. The judges were then asked to consider 
each scorable problem separately, and to assign 
scoring values on a five-point scale to each type 
of response, such that each point on the scale was 
used at least once for each problem. 

The amount of agreement between the two 
panels of judges on the value to be assigned to 
the types of response was quite variable, ranging 
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TABLE 1 


CorRRELATIONS BETWEEN SCORES ASSIGNED BY TWO PANELS OF EXPERTS AND SCORING 
WEIGHTS FINALLY ASSIGNED TO TYPES OF RESPONSE FOR EACH IN-BASKET PROBLEM* 


Correlations 
Between Scores 
Assigned by Two 
Panels of woe 
(AWC and ACSC) 


Correlations 
Between a Com- 
posite of the 
Scores of Both 
Panels and Final 
Scoring Weights 


Correlations 
Between Scores 
Assigned by AWC 
and Final Scoring 
Weights 


Correlations 
Between Scores 
Assigned by ACSC 
and Final Scoring 
Weights 


.87 
-42 
.28 


* In cases where the agreement between the Air Force panels falls below .35, separate correlations 


with the AWC and ACSC panels are reported. 
> No final scoring weights assigned. 


from good to poor. Occasionally even negative 
correlations were observed. The first column of 
Table 1 presents the correlation for each prob- 
lem. 


The test-makers, of course, had their 
own ideas about what the scoring weights 
should be. In the case of some problems 
there was substantial agreement between 
all three groups—the AC & SC panel, the 


AWC panel, and the test-makers. In 
other instances the two military panels 
were in poor agreement; here it was usu- 
ally found that the test-makers agreed 
substantially with one of the panels. In 
only two of the problems was the lack of 
agreement so great as to suggest that the 
problem be dropped. 

The decisions made in determining the 


9 
| | 
Cos9 -64 
C213 -42 .87 
C229 -59 -62 
C446 .69 -84 
C496 
C525 -74 -92 
C650 .82 .92 
C705 86 
C747 -47 -99 
Co83 .58 .81 
Mo87 .80 
Mars 
M278 .78 
M325 -79 -96 
M534 66 .85 
M552 -59 .88 
Mo16 -64 .82 
Moar .00 
P107 -62 -77 
P 139” -00 -- 
P 247 —.20 -59 
P406 -49 .88 — 
P461 —.27 —.20 
P526 -43 -64 
P607 .O1 
O116 -07 .28 
O290 -72 .68 
O301 .78 
Os500a" —.31 
Osoob .22 -58 
0568 -40 .86 
O705 -54 -78 
0786 -66 -93 
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final scoring values were generally ar- 
rived at by the following process: 


1. A careful examination was made of the 
weights suggested by the Air Force panels, in 
order to determine which of the original func- 
tional classes of behavior was apparently being 
given greatest weight in determining their judg- 
ments. These determinations were compared with 
the original notion of the functional class of 
behavior for which the problem had been written. 
A decision was then made as to the functional 
class to which the final scoring of the problem 
would be assigned. By this time, a number of 
problems had become assigned to another of the 
curriculum objectives, namely, Guidance of De- 
cision-Making. 

2. The various types of response were ordered 
by the score developers in terms of their knowl- 
edge of the functional category of behavior de- 
termined in step 1. In doing so, they attempted 
to avoid being influenced by the detailed rank- 
ings made by the Air Force experts. This pro- 
cedure was designed to ensure that the final 
scoring weights would reflect as nearly as possi- 
ble a single functional category of behavior, but 
would at the same time correlate as highly as 
possible with the judgments of the Air Force 
panels of judges. 


An indication of the success of this 
procedure is provided in Table 1, in 
which are presented the actual correla- 
tions between the composite (over-all 
average) Air Force judgments and the 
final scoring weights. (In cases where the 
agreement between the Air Force panels 
as shown in column one of Table 1 falls 
below .35, separate correlations with the 
AWC and AC & SC panels are reported.) 

Scoring a paper, then, consisted in 
reading a response to a problem, compar- 
ing the response with the listed types of 
response for that problem, deciding to 
which type the particular response be- 
longed, and then assigning the numerical 
value which corresponded to that type. 

Sample III of the responses selected 
from the CSS data were not looked at 
until a final scoring summary sheet for 
that problem had been prepared and the 
scoring manual had been written. The 
results obtained in the scoring and anal- 
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ysis of Sample III are presented in the 
next section of this report. 


IV. RESULTS OF THE COMMAND AND 
STAFF SCHOOL TRYOUT 


The “in-basket” materials were tried 
out in July 1953 by administering them 
to the entire Class 53-B of the Command 
and Staff School. The administration re. 
quired four two-hour sessions, which 
were distributed at intervals of about 
three days through the second and third 
weeks of the course. Each student in the 
course played in turn the roles of Com- 
mander (CO), Director of Materiel 
(D/M), Director of Personnel (D/P), and 
Director of Operation (D/O) of the 71st 
Composite Wing. The purpose of the 
July 1953 tryout of the materials was to 
provide data which could be used to de- 
termine some of the operational and sta- 
tistical characteristics of the In-Basket 
Test, as well as to develop the scoring 
method. These characteristics of the test 
will be discussed under four major head- 
ings: Scoring Reliability, Over-all Reli- 
ability, Validity, and Attitudes Toward 
the Test. 


Scoring Reliability 


Reliability of scoring may be assessed 
by comparing the scores assigned by one 
scorer with scores assigned by a second 
scorer, and is expressed here in terms of 
the product-moment correlation coefh- 
cient between the scores assigned by two 
scorers of the same set of responses. 


The scoring reliability may be expected to be 
a function of the amount and quality of training 
that the scorers have had. Appropriate training 
would ordinarily involve direct experience in 
scoring a large number of answers and in dis- 
cussing the scoring of these answers with other 
more experienced scorers. Relevant experience 
with the job situation which the In-Basket Test 
is designed to simulate would be very helpful in 
learning to score the test. 
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TABLE 2 


CoRRELATIONS BETWEEN Scorers A, B, AnD C 
FOR SINGLE DrRECTOR OF MATERIEL 
PROBLEMS AND FOR D/M 
ScorRE 


Problem 


Mo87 
Mars 
M278 
M325 
M534 
Mss2 
Mo16 


Mo41 
D/M Total 


® Scores not available for scorer C. 


Two sets of scoring reliability data 
have been obtained. The first set of data 
is for three independent scorers all of 
whom can be regarded as having had at 
least some relevant previous experience. 
Two of these scorers (A and B) were pri- 
marily responsible for the development 
of the scoring procedures and manual 
and were presumably the best scorers 
available. Scorer C had some contact with 
the military situation portrayed, as well 
as with day-to-day administrative prob- 
lems in his own work, but had not been 
intimately associated with the develop- 
ment of the scoring procedures. The cor- 
relations reported in Table 2 are for the 
first 50 cases of Sample III D/M prob- 
lems for scorers A and B and the first 20 
cases for scorer C. 

The second set of data pertaining to 
scoring reliability is for two independent 
scorers, B and D. Scorer D worked di- 
rectly from the final form of the scoring 
manual without much special training 
and without benefit of prior experience 
with the military situation. These data 
indicate how reliably the In-Basket Test 
may be scored by someone working 
mostly from the printed instructions with 
a minimum of special training. Table 3 


presents the resulting correlations for 
each problem, for the total score for each 
role, and also for the grand total score. 
Even under these conditions scoring re- 
liability is reasonably satisfactory. It is 
clear from these data, considered as a 
whole, that the In-Basket Test can be 
scored with a reasonably high degree of 
reliability. 


Over-all Reliability 


The problem of determining the over- 
all reliability of the In-Basket Test is 
somewhat unusual. The rationale of the 
test calls for the measurement of a series 
of distinct, conceptually different dimen- 
sions—namely, the functional categories 
of behavior identified in the study of 
curriculum objectives. Therefore, one 
aspect of reliability which was of interest 
was the degree to which problems de- 
signed to measure the same dimension 
of behavior did, in fact, measure the same 
thing. At the same time, the structure of 
the test provides four distinct roles, which 
are the separately timed portions of the 
total test. This permitted a test of the re- 
liability of the total score of the In-Basket 
Test under the assumption that the four 
roles are four equivalent forms of the 
test. Thus, a two-way breakdown of the 
problems is possible; and in designing 
the testing battery an effort was made to 
provide problems that could be sorted 
into the cells in such a two-way break- 
down. It should be noted here that all 
values indicating over-all reliability are 
based on the scores assigned by one 
scorer, scorer D. 


The most direct evidence bearing on the initial 
hypotheses that underlay the preparation of the 
items and their assignment to functional cate- 
gories is provided by the intercorrelations of the 
items intended to measure a particular category 
of behavior. One would not expect the inter- 
correlations of single items to be high. Among 
items which make up a test or subtest, however, 


Scorers Scorers Scorers 
| A&B B&C A&C 
N=50 N=20 N=20 
-78 .80 
.85 | 
.88 .87 .85 
“77 
69 .69 
-63 -66 +47 
-48 -51 
_ 
-83 -69 
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TABLE 3 


CORRELATIONS BETWEEN SCORERS B AND D For SINGLE IN-BASKET PROBLEMS, ROLE 
ToraLs, AND GRAND TOTAL FOR COMMAND AND STAFF SCHOOL SAMPLE Ili 


Commanding Officer Problems 


Director of Materiel Problems 


Problem 


Cos9 
C213 
C229 
C446 

525 
C650 
C705 
C747 
C983 


Total Score on CO 
Problems 


N=110 


Problem Correlation 


Mo87 
Mars 
M278 
M325 
M534 
M552 
Mor16 
Moar 


Total Score on D/M 
Problems 


Director of Personnel Problems 


Director of Operations Problems 


N=102 
Problem 


Pos8 
P107 
P247 
P406 
P461 
P526 
P607 


Correlation 
-83 


Total Score on D/P 
Problems 


N= 106 
Problem 


Ox16 
O290 
O3o01 
Os500 
0568 
O705 
0786 
0946 


Total Score on D/O 
Problems 


Correlation 


Total of All Problems r = .90 


generally positive correlations would be required 
in order to show that the items are homogeneous 
in the sense that they all tend to measure the 
same ability. Such homogeneity is a sufficient 
condition for high reliability. 

Intercorrelations of items intended to measure 
particular functional categories are presented in 
Tables 4, 5, 6, 7, and 8. It is evident that there is 
not a high degree of homogeneity among the 
problems. In the cases of Categories II, III, IV, 
and V (the four categories originally established 
as the objectives for measurement), the correla- 
tions are predominantly positive; in the case of 
the problems placed in Category VI, Guidance 
of Decision-Making, even this cannot be said. 
Category III (Flexibility) evidently shows the 
highest amount of homogeneity, and Category 
VI (Guidance of Decision-Making), the lowest 
amount. Category VI was not originally included 
among the functional categories to be represented 
in the test; the large representation of this cate- 
gory in the test resulted from the reclassification 
of problems because of the opinions of the Air 
Force committees. All problems in this category 


were originally written to represent other func- 
tional categories. 


None of the categories shows enough 
homogeneity to warrant computation of 
a separate reliability coefficient. If such 
a coefficient were computed for the reli- 
ability of the Category III total score, it 
is guessed that it would be equal to about 
.40 OF .50. 


It will be noticed that in some of the tables 
of intercorrelations, one or two problems account 
for most of the negative values. In Table 4, for 
example, problem O2zgo accounts for both of the 
negative correlations, and in Table 8 problem 
P461 is consistently negative. As in ordinary test 
construction procedures, the homogeneity and 
hence the reliability of the score could be im- 
proved by eliminating these items which detract 
from the total test. As it is, seven per cent of the 
correlations reach the one per cent level of 
significance. 


12 
N=112 
Correlation 
-75 
.87 .81 
.89 -65 
.81 84 
-83 -62 
-93 -69 
-76 
.88 
.83 
-73 
-79 -64 
-75 
-80 . 86 
.80 -78 
-70 -79 
.83 
Bo 86 
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TABLE 4 TABLE 5 


INTERCORRELATIONS OF PROBLEMS PLACED IN INTERCORRELATIONS OF PROBLEMS PLACED IN 
FUNCTIONAL CATEGORY II Functionat Catecory III 
(UsE oF ROUTINES) (FLEXIBILITY) 


Problem | C446 Mo16~ P607 Problem | Cosg Mars P247 0568 


Cos9 
Mars 
P107 
P247 II 
0568 


** Significant at the .or level. 
TABLE 6 


INTERCORRELATIONS OF PROBLEMS PLACED IN FUNCTIONAL CATEGORY IV 
(ForeEsIGHT) 


Problem C747 M534 Moat P526 O116 


C747 18 II 
M534 —02 17 
06 02 

P526 18 

O116 II 09 

Osoo —06 

O705 17 12 


** Significant at the .or level. 


INTERCORRELATIONS OF PROBLEMS PLACED IN FUNCTIONAL CATEGORY V 
(EVALUATION OF Data) 


Problem C650 Mo87 M278 P406 O301 0786 0946 


C650 14 —10 
Mo87 —30** 
M278 —30** 

P406 —02 05 
O301 22* —14 
0786 17 13 
0946 28** 


* Significant at the .os level. 
** Significant at the .or level. 


TABLE 8 


INTERCORRELATIONS OF PROBLEMS PLACED IN FUNCTIONAL CATEGORY VI 
(GUIDANCE OF DECISION-MAKING) 


C213 C229 C496 Cs25 Co83 M325 Mss2 


C213 —05 14 
C229 09 —02 08 
C406 —10 13 
C525 
C705 
M325 
Mss2 
Pos8 
P461 


* Significant at the .o5 level. 


C446 10 —05 
Mo16 10 08 10 
P607 09 08 —05 
O290 —05 10 —05 
—06 17 
15 —13 
13 
12 
28** —03 
20 
20 
TABLE 7 
—17 03 fore) —0o2 
—02 22* 17 —o1 
05 —14 13 28** 
08 27** II 
08 05 06 
27** 04 
Ir 06 04 
—07 13 10 —07 
—23* —06 03 —17 
—03 —06 —o1 
05 II —0o6 
06 — oo 
—25* 20 14 —o1 
03 02 —03 
03 II —o1 
II —o9 
—03 —09 
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The lack of evidence of a high degree of homo- 
geneity should not be surprising in view of the 
subjective nature of the categories. An empirical 
approach to the problem of classification into 
functional categories may prove to be more valid 
than the judgmental approach which was used. 

The next problem to consider involves 
the qaestion of whether or not the In- 
Basket Test as a whole may provide a 
meaningful total score. Evidence on this 
point is provided by the intercorrelations 
of the total scores for the four roles, and 
the correlations of these with the grand 
total score. This information is presented 
in Table 9. The numbers in Table 9 vary 
widely, with the correlations between role 
total scores ranging from .06 to .45. The 
three smallest correlations all involve 
the relations between total score for D/P 
and the other scores; none of these three 
correlations is statistically significant. 
The other three correlations, which do 
not involve the D/P score, are all signifi- 
cant beyond the .os level, using a one- 
tailed test of significance. Thus, aside 


from the D/P score, there appears to be 


some homogeneity in the materials 
covered by the different roles. (Whether 
the D/P score is a reliable measure of 
something else, or is merely an unreliable 
measure of the same function measured 
by the other three roles, cannot be an- 
swered by the data in Table 9.) 

The information contained in Table 
9 provides a basis for estimating the reli- 
ability of the total In-Basket Test. This 
works out to be .50 for the eight-hour 
battery, and .42 for the six-hour battery 
obtained by omitting the D/P problems 
from the total test. The former estimate 
of .50 is probably too high because the 
assumption of equivalent parts of the 
test is not satisfied (8). 

These values for reliability, while they 
are of definite statistical significance, are 
not of sufficient magnitude to justify the 
interpretation of an individual’s scores 


TABLE 9 


INTERCORRELATIONS OF THE ROLE SCORES AND 

CORRELATIONS OF ROLE SCORES WITH GRAND 

ToTtaL SCORE ON THE IN-BASKET TEST FOR 
COMMAND AND STAFF SCHOOL SAMPLE III* 


Grand» 


CO D/M D/P D/O | “Fotal 


24 14 19 68 
24 15 45 69 
14 15 06 5° 
19 67 


68 69 50 67 


N=92. 

> Correlations of Role Scores with the Grand 
Total Score are, of course, spuriously high since 
the Grand Total Score is based on the sum of the 
four Role Scores. The expected “‘chance” value 
of these correlations is about .50. 


on the In-Basket Test. Applying the 
Spearman-Brown prophecy formula, it 
appears that approximately 24 hours of 
testing time would be required in order 
to achieve a reliability of the order of .75 
or more with material like that in the 
present In-Basket Test. However, it must 
be remembered that the present test is 
the first such instrument constructed, and 
the present tryout is a pretest of this first 
experimental form. It seems probable 
that higher reliability could be obtained 
by selection of the best problems, by de- 
velopment of better problems to replace 
those found to be unsatisfactory, and by 
improvement of the scoring categories 
and methods. 

The obtained values for reliability for 
the present form are sufficiently high to 
justify comparing groups, for investigat- 
ing correlations between scores and vari- 
ous criteria, etc. The original purposes 
for which the In-Basket Test was devised 
were all of this type, and it may therefore 
be concluded that, insofar as reliability is 
concerned, the original objectives of the 
test can be met using the materials made 
available in this research. 

Another important objective of this 


| 
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research was the differentiation of several 
different desired outcomes of instruction 
in the CSS. It was hoped that scores could 
be obtained from testing that would 
make it possible to determine the relative 
success of the instruction in achieving 
these different desired outcomes. Is there 
any evidence in the present data that 
would suggest that different meaningful 
outcomes can be differentiated? If there 
is, it may compensate for the failure of 
the a priori categories of behavior to be 
clearly reflected in the empirical findings. 

In order to make possible the discovery 
of one or more categories of behavior, a 
complete table of intercorrelations among 
all the individual problems included in 
all four roles was computed. This table 
was inspected in an effort to isolate 
clusters of at least five problems for 
which all the intercorrelations were rela- 
tively large and consistent in direction, in 
the sense that if a problem correlated 
negatively with one of a number of posi- 
tively correlated problems it must cor- 
relate negatively with all of them. 

Four clusters meeting this requirement 
were found; four of them are presented 
in Tables 10, 11, 12, and 13. The fourth 
cluster, it will be noted, consists of six 
variables. Several general observations 
may be made concerning these tables. 

First, the four clusters presented involve, all 
told, fifteen different variables, which is almost 
one half of the individual problems available for 
inclusion in clusters. If clusters of four instead 
of five variables were sought, many of the re- 
maining problems could be brought into clusters. 
This result seems to justify the inference that 
most of the individual problems do possess some 
degree of reliability over and above mere scoring 
reliability. The obtained estimates of over-all 
reliability could thus be improved by appropriate 
regrouping of problems. 

A second general observation is that not all of 
the problems are positively intercorrelated with 
the rest of the problems in their clusters. In 
particular, problem P461, which occurs in three 
of the clusters, has consistently negative correla- 
tions with the other problems in all three 


TABLE 10 
CLUSTER I 


Mars 0568 Osoo 0786 P461 


TABLE 11 
CLUSTER 2 


Mats Os0o O568 C496 


27 

42 

22 14 
—16 —-14 


TABLE 12 
CLUSTER 3 


Osco M534 O116 P461 O301 


15 _ 
28 31 
—20 —23 
27 14 


TABLE 13 
CLUSTER 4 


Prob- 
lem 


P526 
P607 
C213 
C983 
Ms52 20 
Pos8 14 


Ps26 P607 C213 Co83 Mss2 Poss 


clusters, A related observation is that the prob- 
lems scored as measures of functional Category II 
(Use of Routines) tend to have more negative 
correlations with other problems than do prob- 
lems scored for the other functional classes of 
behavior. In particular, negative correlations are 
observed between problems scored for Class II 
(Use of Routines) and Class III (Flexibility). 
Both of these observations suge:st that the de- 
sired outcomes of instruction in the Command 
and Staff School are not merely nonhomogene- 
ous, but are to some extent contradictory and 


15 
Problem | 
Mars 
0568 42 
Os500 27 19 — 
0786 24 15 17 re 
P461 -—14 -—20 —12 
Prolen | 
Mars _ 
Osoo 
0568 
C4096 
P461 
Problen 
534 
O116 
P461 
O301 12 —18 
| | 
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conflicting in the manner of their interrelation 
within the student population. The possibility 
must seriously be considered that achievement 
towards one objective will simultaneously repre- 
sent movement away from some other objective. 
The achievement of a proper balance between 
objectives thus becomes an even more difficult 
problem. 

A third general observation based on Tables 
10 through 18 is that it might be possible, by 
some method such as factor analysis, to obtain 
an empirical set of classes of functional behavior 
which would describe the domain of the prob- 
lems better than the a priori system of classifica- 
tion. Such a factor analysis would need to be 
applied to the complete table of intercorrelations 
and would be more likely to provide useful re- 
sults if an even larger number of problems were 
available for simultaneous analysis. In view of 
the generally small magnitude of the problem 
intercorrelations, it would be appropriate to con- 
duct the factor analysis according to some rigor- 
ous method, such as the Lawley Maximum Like- 
lihood Method, to ensure that the bounds of 
statistical significance were not being exceeded 
by the number and nature of the factors ex- 
tracted. 


It is of possible interest to offer an in- 
terpretation of some of the clusters rep- 
resented in Tables 10 through 13. Each 
cluster is likely to represent a group of 
problems that would turn out to have 
loadings on the same factor if a factor 
analysis were performed. 


Let us first consider Table 10. All of the prob- 
lems in this cluster seem to imply a desire on 
the part of the officer to make judgments suited 
to the merits of the individual cases, without 
regard to standard procedures which may or may 
not indicate the same conclusion. Thus, in prob- 
lem M215, both the best and poorest answers in- 
volve acceptance of the endorsement to an in- 
spection report, but the best answer also indicates 
the intention of conducting a special inspection 
of the ammunition storage facilities. This prob- 
lem best represents the cluster, in view of its 
relatively high correlations with the others. In 
problem 0568, both good and poor answers in- 
volve following a training regulation. The good 
answer wants to establish the “intent” of the 
regulation, and then to provide an SOP “inter- 
preting” it as it applies to the present situation, 
whereas the poor answer merely calls for “strict” 
compliance. In problem O500, both good and 
poor answers call for examination of a three-hour 
turn-around time, which is too long. Here, the 
good answer calls for an initial meeting to dis- 
cuss the problem; whereas the poor answer 


plunges immediately into a detailed study of 
the problem, which may be unnecessary. In prob- 
lem P461, which indicates that hardship dis- 
charges have been given to airmen in cases where 
a bad conduct discharge would have been more 
appropriate, decisions that move toward the 
preparation of a policy statement on the matter 
generally are regarded as better than those mov- 
ing toward consideration of the individual case; 
and we observe, accordingly, that scores on this 
problem have a negative correlation with the 
rest of the cluster. 

The cluster presented in Table 11.has four 
problems in common with the one just discussed, 
and can be interpreted along the same psycho- 
logical lines. Most of the problems in the clusters 
represented by Tables 11 and 12 were intended 
to measure the Flexibility category of behavior. 
They may perhaps be thought of as representing 
a more homogeneous, “purified” set of flexibility 
items. 

Let us now consider Table 12. All of the solu- 
tions correlating with this cluster seem to imply 
a willingness to take definite action on problems 
that clearly call for definite action, even on prob- 
lems that clearly call for prolonged considera- 
tion. Thus in problem M534, which best repre- 
sents this cluster, the scoring is entirely in terms 
of the amount of concrete action taken in the 
direction of grounding possibly defective Fg94 
aircraft. Problem 0116 is very similar, and is 
scored in terms of the amount of concrete action 
facilitating an investigation of the fouling of 
Ci1g carburetors. In problem O500, one of the 
two best answers is to set up immediately a 
maintenance training program, while the poorest 
answer is to do nothing. In problem P461, any 
definite decision regarding a policy statement on 
hardship and bad conduct discharges is given a 
poorer score than an answer delaying the deci- 
sion; even taking no action at all is given a 
positive scoring weight. Accordingly, this problem 
is negatively correlated with the remainder of 
the cluster. Again, in problem Ogo1, the definite 
decision to include various factors in a solution 
earns higher scores, and taking no action earns 
relatively lower scores. All four of the problems 
positively correlated with this cluster were origi- 
nally planned as measures of the Foresight cate- 


ry. 

A third example may be taken from Table 13 
which contains the intercorrelations of six prob- 
lems. All of the problems seem to demonstrate 
a working knowledge of the organization and 
appropriate functions of various offices, including 
the office currently held by the subject taking 
the test. The exact nature of the action taken in 
each problem varies. Thus, in problems Pos8 and 
P607, the basic problem needs to be referred to 
an appropriate person, at either lower or higher 
echelon. In problems C213 and C983, more in- 
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formation needs to be gathered, and the recom- 
mendations of the basic problem need to be 
modified. In problem P526, there is enough in- 
formation to justify immediate concurrence, In 
problem M552, advice should be given, but per- 
sonal charge of the project should not be taken. 
Most of the problems in this cluster came to be 
regarded as measures of the “Guidance of De- 
cision-Making” class of behavior. 

In general, the evidence presented sup- 
ports the results of the armchair analysis 
but suggests that an improved basis of 
classification of school objectives could 
be found which would permit the devel- 
opment of reasonably homogeneous sub- 
tests of sufficient reliability for individ- 
ual measurement. Further improvement 
might come from empirical refinement of 
the scoring weights attached to the types 
of response. The present test materials 
seem to provide an adequate basis for 
group comparisons and correlational 
studies, by utilizing the group of prob- 
lems within the experimental in-baskets 
for which there is some evidence of con- 
tent reliability. The reliability of the In- 
Basket Test in its present pretest form 
is satisfactory for certain limited uses 
(those involving comparisons of groups 
of reasonable size), and with further de- 
velopment the reliability might become 
sufficiently high to justify use of the test 
in individual selection or placement. 
Validity 

The term validity may be thought of 
in several senses. The simplest concept 
of validity is “predictive” validity; pre- 
dictive validity may be measured by the 
correlation between scores on the test and 
some measure of success in the activity 
the test is supposed to predict. The In- 
Basket Test is not primarily intended to 
forecast success. It was intended rather as 
a measure of achievement which would 
be useful in the evaluation of instruction. 

How can an achievement test be vali- 
dated? It might be argued that an 


TABLE 14 


CorRRELATIONS OF ROLE TOTAL SCORES AND 
GrRanD ToTAL Score witH ACE Torai 
Score, AND CSS Finat GRADE FOR 

CSS Samp te III* 


ACE 
Variables Total Final 
Score Grade 
Total Score 
CO Problems .24 -14 
Total Score 
D/M Problems —.o1 -10 
Total Score 
D/P Problems .12 .12 
Total Score 
D/O Problems -25 .02 
Grand Total Score 
* N=92. 


achievement test does not need to be vali- 
dated; insofar as the test directly repre- 
sents the skills it is intended to measure, 
the test is “self-validating.” Probably no 
one would quarrel about such a defini- 
tion of validity for a test of (say) long 
division. In the present instance, how- 
ever, there might be more disagreement 
about whether or not the test does repre- 
sent the skills taught; the objectives of 
the instruction in the Command and 
Staff School are not quite as neatly de- 
fined by a set of test items as are the ob- 
jectives of an arithmetic course. 

In the long run, the validation of a 
test depends upon building up a wealth 
of experience which is consistent with a 
particular definition of a test. We tend to 
accept a test as valid to the extent that 
over a period of time it is found to cor- 
relate positively with the measures which 
we feel the test should correlate with and 
not with things we feel it should not 
correlate with. Gulliksen (5) has called 
such a concept of test validity “intrinsic 
validity.” 

Correlations between In-Basket Test 
scores and four other kinds of data have 
been computed. None of these other vari- 
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TABLE 15 
BREADTH OF EXPERIENCE AS RELATED TO ROLE TOTAL SCORE ON THE IN-BASKET TEST 


Narrow Experience 
D/O 
Est. 


Primary Air 
Force Specialty 
Code 


co 
Est. 


n 
Mean 


Mean 


Personnel 

Administration 

Maint. & Supply 
tions 


—4.12 
—2.50 


—2.56 
—1.78 
—2.25 
—1.88 


=o.!I 


Not significant 


F =2.05 
Not significant 


F=0.79 
Not significant 


‘orce Specialty 
Code 


D/P 
Est. 
Mean 


Est 


Est. 
SD 


SD 


6.36 
2.70 


Personnel 
Administration 
Maint. & Supply 
Operations 


Broader Experience 
°.71 


D/O 
6.32 


Est. 

2 °.50 
5 
3.21 7 


—1.40 
—o.71 
—2.56 


3.00 
3-40 
2.88 


Mean 
2.30 3.56 


4.10 


| 
| 
He 


| F=0.36 


F=0.07 
Not significant 


Not significant 


ables is a “criterion.” The correlational 
results are presented merely as a contri- 
bution to our understanding of what is 
and what is not measured by the In- 
Basket Test. 

The four kinds of data are as follows: 
(a) scores on the American Council on 
Education Psychological Examination 
(the “ACE”); (b) CSS course grades; (c) 
statements about type of Air Force ex- 
perience; and (d), statements about 
breadth of Air Force experience. 

Table 14 presents the correlations of 
the first two kinds of data with the four 
role total scores and over-all total score 
on the In-Basket Test. The over-all total 
score correlates .25 with ACE total score, 
which is clearly significant from a statisti- 
cal viewpoint. Such a correlation with a 
measure of mental ability is to be ex- 
pected in a test that measures any sort of 
intellectual function. 

Since the ACE total score and the CSS 
grades are correlated only to the extent 
of —.o2 in our sample, we may regard the 
CSS grade as being completely independ- 
ent of ACE score. The correlation of the 
in-basket total score with CSS grades was 


found to be .15. While this correlation is 
on the borderline of statistical signifi- 
cance when evaluated with a one-tailed 
test against the null hypothesis, certainly 
it does not represent a relationship of 
much practical importance. It cannot be 
determined from the available data 
whether the low correlation observed 
here is due to relatively low reliability 
for the CSS grades (as well as for the 
present form of the In-Basket Test), or to 
lack of any strong relationship between 
the two measures. In view of the obser- 
vations noted above which suggest that 
some of the objectives of the course may 
prove to be diametrically opposed to 
other objectives of the course, we per- 
haps should not expect a very high cor- 
relation. It must be remembered also that 
the In-Basket Test was administered early 
in the course, not as a final examination. 

Table 15 presents information involv- 
ing the third and fourth types of data 
mentioned above. This table is divided 
into two parts, according to breadth of 
administrative experience. The larger 
group of students, represented at the top 
of the table, are considered to have had 


18 
n Est. n Est. Est. Est. 
Mean SD Mean 
16 4.69 14 1.14 2.41 16 4.47 14 2.14 4.05 0.26 
19 4:45 19 2.36 4.33 18 4.83 16 4-44 4-91 1.09 
8 .20 8 —0.25 5.44 8 5.92 7 2.86 1.57 
34 4-79 | 35 3-59| 32 3-64 | 31 3-81 3.42 | 
n Est. Est. n Est. Est. Est. 
Mean SD Mean SD Mean 
2 3.00 2.83 2 —1.50 1.41 1.25 
8 0.88 2.80 4-50 3.36 1.82 
8 1.00 6.87 4.00 4.09 1.79 
3 —2.00 3.61 9 1.44 4.25 
F=0.45 F=0.34 
Not significant Not significant 
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relatively narrow administrative experi- 
ence, because their Primary Air Force 
Specialty Code (PAFSC) at the time of 
testing fell in the same general area as 
their stated area of greatest experience. 
The remainder of the group, at the 
bottom of the table, are considered to 
have had relatively broad experience, be- 
cause their PAFSC at the time of testing 
did not fall within the same general area 
as their greatest experience. Within these 
major groupings, the cases have been 
divided according to the stated area of 
greatest experience, the areas having been 
chosen so as to correspond as closely as 
possible with the roles of the In-Basket 
Test. For each subgroup the table shows 
the mean and standard deviation for each 
of the four roles of the In-Basket Test. 
The data in Table 15 were analyzed 
using analysis of variance, and the F 
values which resulted are shown. Com- 
paring the score variance within these 
subgroups, we find that all but two of the 
F values are less than unity, and one out 
of eight values reaches significance at the 
.05 level of confidence. In view of the 
number of tests made, even this result 
does not warrant further consideration. 
It should be noted that the great hetero- 
geneity of variances within the groups 
based on area of greatest experience the- 
oretically makes the F test invalid. How- 
ever, the effect produced by this imper- 
fection in the data would be in the 
direction of producing apparent signifi- 
cance when it should not exist. Since the 
results do not appear significant anyway, 
the interpretation seems to be surely 
justified that area of greatest administra- 
tive experience has little if anything to 
do with performance on the In-Basket 
Test. This is an important conclusion, 
because it means that there is no neces- 
sity, for reasons other than face validity, 
to construct a special form of the In- 


Basket Test to examine officers in various 
areas of specialization. 

The other comparison permitted by 
the data of Table 15 is between those 
individuals having broad and those hav- 
ing narrow administrative experience. 
These comparisons were originally sug- 
gested by a close study of the results for 
the D/M problems, where it was found 
that those with broad experience had 
D/M total scores significantly better than 
those with narrow experience, at the .o5 
level of confidence. As soon as the scores 
for the CO problems were available, and 
the data were subjected to the same kind 
of treatment, it was found that the officers 
with broad experience did better, and 
that this difference was significant at the 
.o1 level of confidence. However, when 
the relative performance of these groups 
on the D/P and D/O problems was as- 
sessed, no significant differences were 
found. 

None of this information settles the 
question of the validity of the In-Basket 
Test. The findings do represent a be- 
ginning in the development of our un- 
derstanding of the nature of the test. No 
conclusive evidence that the test is valid 
can be offered. In the final analysis, valid- 
ity is confidence in a test which is gen- 
erally borne out by numerous observa- 
tions about the test over a period of time. 


Attitudes Toward the In-Basket Test 


Following the administration of the 
In-Basket Test, a memorandum was 
mailed to the CSS students. This mem- 
orandum provided the students with ad- 
ditional information concerning the pur- 
pose of the project in which they had 
cooperated. 

A list of eight proposals for possible 
future applications of the new technique 
was included in the memorandum. These 
uses were stated, as follows: 


‘t= 
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TABLE 16 


JUDGMENTS OF 111 CSS STUDENTS WITH REGARD TO EIGHT PROPOSALS FOR FUTURE 
Use oF THE “IN-BASKET’’ TEST MATERIALS 


Judgment 


Proposed Use of Test* 


“Suitable” 


“Possibly 
Suitable” “uN 


Not 
Suitable” 


No 


“Of Doubt- 


ful Suit- 
ability” 


. Basis for officer evaluation at command 
and staff levels 

. Basis for selection of the superior officer 
for special assignments 

. Basis for group comparison 

. Absolute standard of performance in com- 
mand and staff level jobs 

. Indicator of group strengths and weak- 
nesses 

. Realistic work sample for preparing officer 
to face his new job 

. Instructional materials 

. Further technique of final evaluation in 
school such as 


10 64 
9 58 
7 18 
5 82 
8 21 
6 41 
9 24 


14 36 


Note.—Each cell shows the percentage of the pet group expressing a judgment. 


® For exact statement of pr 


1. To provide a basis for uniform, impartial 
evaluation of officer effectiveness at the command 
and staff level. 

2. To provide a basis for selecting superior 
officers for certain types of assignments. 

3. To provide a basis for comparing one group 
of officers with another group which may have 
had different job or educational experience, or 
both. 

4. To provide an absolute standard of per- 
formance in command and staff level jobs. 

5. To provide information to a school such as 
CSS concerning the general strengths and weak- 
nesses of a group of students. 

6. To provide students in a school such as CSS 
with a realistic sample of the work they will later 
meet in command and staff jobs of the Air Force. 

7. To provide instructional problems for which 
most of the possible alternatives are known in 
advance. 

8. To provide a further technique of final 


evaluation of individual students in a school such 
as CSS. 


The students were invited to evaluate 
the test materials in the light of the eight 
proposals, and to include suggestions for 
the improvement of the materials if they 


saw fit. The questionnaire was worded 
as follows: 


uses see page 


It is requested that each student indicate his 
answer to the following two questions: 

(a) Considering the test materials as you saw 
them, and answering from your personal point 
of view, do you consider them adequate for the 
purposes stated above in 2-c? (Insofar as your 
answer may be negative, please be specific.) 

(b) Do you have any comments or suggestions 
for improving the test materials, for the purposes 
stated above in 2-c? (Again, if you do, please be 
specific. You may omit any comment that ap- 


plies only to a single instance in the test ma- 
terials.) 


Of the 500 or so copies of the memo- 
randum that were distributed, 335 were 
filled out and returned. The majority of 
the respondents chose to discuss the ma- 
terials generally, rather than to relate 
their comments to any specific proposed 


application as it was outlined in the 
memorandum. 


However, 111 of the students did relate their 
comments to specific proposals. Their opinions 
are represented in Table 16. Within this group 
of 111 the tendency seems to have been to ap- 
prove of applications of the in-basket technique 
to groups, but to object to its use in the evalua- 
tion of individuals. These comments, it will be 
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I 
22 4 

2 
28 5 
3 66 9 
6 7 
65 6 
51 2 
59 8 
44 6 
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remembered, applied to the actual test materials 
seen, not to the general test idea or to some im- 
proved test which might be developed. 

Of the other 224 respondents, 42 felt the ma- 
terials were adequate, but had no suggestions 
to offer in the light of any of the proposals. The 
remaining 182 evaluated the materials in a 
general way, making no attempt to relate their 
comments to the specific proposals. The re- 
sponses exhibited a great range of opinion, 
varying from profound skepticism of the test 
to enthusiastic approval. 

The most telling criticisms pointed to a need 
for revision of the briefing materials. Some in- 
volved requests for amplification of the informa- 
tion contained in the briefing, such as more 
teference material on the policy of the wing, 
fuller organizational and functional charts, more 
information regarding predecessors’ duties, and 
some reference to the handling of classified ma- 
terial prior to placing in the in-basket. 

Perhaps even more vital to the ultimate success 
of the test were suggestions that the briefing 
should ensure an appropriate “set” on the part 
of the student. Some felt that the importance of 
the project should have been more heavily 
stressed if full cooperation was to be obtained. 
Others felt that a fuller and more frank ex- 
planation of the purpose of the test would have 
helped in obtaining this cooperation. 

A large number of students expressed what 
might be interpreted as hostility toward the test- 
ing procedure due to what they felt was an 
“artificial” restriction on normal modes of com- 
munication. Many suggested that memo writing 
is an unimportant tool of the good staff officer, 
and that his skill is exhibited, rather, in per- 
sonal contact with his staff. 

Some of the respondents suggested that this 
attitude toward the test created by the restriction 
on communication might be somewhat allayed 
by more adequate preparation in the initial 
briefing. It was suggested that the examiners 
display in the briefing their awareness of these 
limitations. Some students felt that specific ref- 
erence to means by which the student could 
indicate the content of hypothetical phone calls, 
staff meetings, or informal conversations would 
be helpful. 

A large number of students objected to the 
problems on the basis that they were “unrealis- 
tic,” although the problems have been based 
upon real events in the Air Force. Others felt 
that the high proportion of problems exhibiting 
poor staff action gave an unrealistic picture of 
the base as a whole. 

Some criticisms were directed toward other 
aspects of the test which deviated from actual 
experience, such as failing to provide an op- 
portunity for group discussion and cooperative 
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solution of the problems. Some felt that insuffi- 
cient background material precluded any basis 
for wise action. Those who acted, they felt, would 
be making superficial judgments. Others were 
convinced that the short time allowance militated 
against considered judgments. 


V. RECOMMENDATIONS AND SUGGESTIONS 
FOR FURTHER RESEARCH 


The present In-Basket Test represents 
a first attempt to develop an instrument 
for use in evaluation of education at the 
very high level of complexity which exists 
in the Command and Staff School at Air 
University. It is a prototype of an in- 
strument which might become valuable 
not only for training evaluation but also 
for individual selection, placement, and 
guidance. Now that the pretest of this 
measuring device has been completed, 
recommendations can be made about the 
use of the In-Basket Test in its present 
or an equivalent form, and suggestions 
can be offered which have to do with 
further development of the test. 


Recommendations for Use of the Test in 
Its Present Form 


1. The In-Basket Test may justifiably be used 
in making comparisons between mean scores of 
groups of examinees. This is the use for which 
the test was originally intended. Thus a group 
of CSS students tested at the beginning of the 
course might be compared with a group tested 
at the end of the course. Or a group of CSS 
graduates might be compared with a group of 
officers who are similar to the graduates except 
that they have not been students in CSS. If the 
groups to be compared are reasonably large, at 
least some of the part scores for roles and for 
functional categories could be used in such com- 
parisons as well as total score. Analysis of co- 
variance is the preferred method of analysis if 
suitable measures can be found to provide a 
basis for controlling ability factors in these com- 
parisons. 

2. The In-Basket Test may also be used as 
instructional material. Administration of homo- 
geneous parts of the test at appropriate points 
in the course, followed by critiques of the per- 
formance of the students would seem to be an 
excellent instructional device. This is a use of the 
test which was not planned, but which would be 
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warranted to the extent that instructors found 
the problems suitable for their objectives. 

3. The frequencies with which the various 
types of response occur in an administration of 
the In-Basket Test may enable instructors to gain 
greater understanding of the results of their in- 
structional efforts as they apply to specific per- 
formances of students. For example, instructors 
might find that in view of the large proportion 
of students making one of the low-rated types of 
response, many students must have failed to 
generalize some point in the instruction, On the 
other hand, they might be pleased to find that 
a new method of presentation had increased, 
from one class to the next, the proportion of 
students giving a “good” type of response to sev- 
eral similar problems. Such detailed analysis of 
performance on an examination provides maxi- 
mum guidance to the instructor who wishes to 
improve the quality of his teaching. 

4. The test might be used as a method for 
assessing a group of entering students in order to 
find out their general level of capability in the 
skills measured by the test. Thus the level or 
amount of instruction might be varied in ac- 
cordance with the ability of a particular entering 
class, or the students might be “sectioned” into 
two or three levels of ability in order that the 
level and amount of training would be more 
suited to their needs. 

5. The test should not, in its present form, be 
used as a basis for individual evaluation, either 
for selection, placement, guidance, or measure- 
ment of course achievement. 


Suggestions for Further Research 


The pretest form of the In-Basket Test 
represents a novel approach to a measure- 
ment problem, and little recorded ex- 
perience was available to guide the de- 
velopment of the test and scoring pro- 
cedure. Consequently the tryout has re- 
vealed a number of ways in which, it is 
felt, the test and testing procedure could 
be improved. The following suggestions 
concern improvement of the test with 
respect to the breadth of information re- 
flected in its scores, and the reliability of 
measurement. 


1. If one reads a sample of responses to the 
In-Basket Test, he will probably get the impres- 
sion that much more information about the ex- 
aminees is contained in the responses than is 
reflected in the scores reported. The scoring 


procedure requires the scorer to decide which of 
several types of response a particular response 
resembles, and this involves throwing away a 
good deal of information. The first recommenda- 
tion, therefore, has to do with extending the 
breadth of information obtained from the test 
by suitable revision and extension of the pro- 
cedure for evaluating the responses. 

One revision or extension of the scoring sys- 
tem proposed here is to give the scorer greater 
freedom in awarding or subtracting points for 
exceptionally good or poor performance with 
respect to categories of behavior other than the 
one for which the problem is primarily being 
scored, For example, in a response to a problem 
primarily intended to measure foresight, the ex- 
aminee might write something which indicates 
outstanding ability to make effective use of stand- 
ard procedures. Thus he would be given credit 
for this response by tallying a point in the 
“SOP” column of his score sheet. 

Similarly, evaluations might be made of char- 
acteristics that the test was not originally in- 
tended to measure and that perhaps were not re- 
vealed in the content analysis of outcome state- 
ments. For example, various degrees of coopera- 
tiveness seem to be revealed in the writing of 
many of the examinees, Some students go beyond 
the call of duty in offering their services in their 
memoranda, while other students never indicate 
in their writing a spontaneous willingness to help 
someone else work out a problem. If these state- 
ments reflect a genuine attitude of cooperative- 
ness, it would seem desirable to work out a 
procedure for recording the instances as they are 
found in the scoring process. Another character- 
istic which at least occasionally seems to be 
revealed is the attitude associated with sympathy 
or harshness in dealing with subordinates. The 
In-Basket Test, in other words, may be thought 
of as a projective test which reveals a great deal 
more about the personality of the writer than is 
revealed by the objective scoring method de- 
scribed in its Appendix. Developmental work 
leading to modification of the test and evaluation 
procedure in an attempt to capitalize on the 
breadth of information which seems to be con- 
tained in the responses seems to be justified. 

2. The most obvious fault of the In-Basket 
Test in its present form is its low reliability. 
While low reliability might to some extent be 
compensated for by breadth of information, 
greater reliability is certainly desirable and 
probably could be attained without necessarily 
decreasing breadth of information. The follow- 
ing suggestions have to do with increasing the 
reliability of the In-Basket Test scores. 

The development of the scoring procedure in- 
volved making a content analysis of the responses 
to a particular problem and then asking military 
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experts to evaluate the resulting types of be- 
havior. One possible reason for low reliability 
might be too great an influence of military as 
opposed to psychological judgment in the order- 
ing of these types of responses. High reliability 
implies homogeneity of the items of a test with 
respect to the variable being measured. The pro- 
cedure used in this research may have introduced 
more impurity than was necessary or desirable. 

A second suggestion pertaining to the improve- 
ment of reliability is that the homogeneity of 
the subtests be increased by suitable analytical 
procedures. The major categories of behavior 
which correspond to the most important subtest 
scores (Foresight, Flexibility, etc.) were obtained 
by classifying the statements of outcomes of in- 
struction given by the CSS faculty. An analysis of 
the responses to all the test problems by factor 
analysis (or some analogous method) might lead 
to the discovery of more valid bases for grouping 
items into subtest scores. The cluster analysis 
already performed tends to support the argu- 
ment that homogeneous subtests would result 
from such a procedure. 

Finally, improvement of reliability should re- 


sult from more stringent selection and revision 
of items. 


In addition to the specific suggestions 
made above, it should be emphasized that 
if the test is to function as an adequate 


evaluation device it must frequently be 
used and revised by the instructional 
staff. The involvement of instructors, 
rather than outside staff, in the continued 
development and revision of the test will 
be more likely to ensure that the results 
of the testing will be used and that the 
content of the test is appropriate. ‘ 


VI. SUMMARY 


This report describes the preparation 
of test materials which were designed to 
aid in the evaluation of instruction in the 
Command and Staff School of Air Uni- 
versity. A major purpose of this course 
is to increase the administrative profi- 
ciency of field grade Air Force officers. 
Test materials for evaluation of instruc- 
tion were considered desirable in order to 
support improvement curriculum 
planning and instructional procedures. 


The test which was developed is a 
situational test which requires the ex- 
aminee to play four roles: Commanding 
Officer, Director of Materiel, Director of 
Personnel, and Director of Operations of 
a fictitious Composite Wing. The exami- 
nee is provided with background infor- 
mation about the wing and the Air Force 
base where it is located, suitable material 
from the “files,” and the contents of an 
in-basket which is appropriate to each 
role (hence the name, In-Basket Test). 
The in-basket contains letters, memo- 
randa, and other documents which em- 
body problems aimed at eliciting be- 
havior relevant to the objectives of 
instruction in the course. The examinee’s 
task is to write the letters and memo- 
randa which he would write if he were 
actually on the job, to sign (or not to 
sign) letters prepared for him, or to take 
other suitable action. Scoring of these 
products is intended to provide informa- 
tion which would be useful to the Air 
University in evaluating the instruction 
in the Command and Staff School. Full 
instructions for administering and scor- 
ing the test are provided in its Appen- 
dices. 

This report also presents the results 
of a tryout of the test. The purpose of 
this tryout was to investigate some of tne 
statistical properties of the test, student 
reaction to the test and the adequacy of 
the instructions and procedures for ad- 
ministration. An entire class in the Com- 
mand and Staff School took the test in 
the third week of the course. 

The results indicate that the In-Basket 
Test can be scored with a reasonably high 
degree of reliability, but that the present 
form of the test is low in content relia- 
bility. There is evidence, however, that 
the content reliability could be improved 
by reassigning scoring weights to the re- 
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sponses; by eliminating poor items and 
revising others; by increasing the homo- 
geneity of the items through use of tech- 
niques such as factor analysis. 

No criterion of validity was available, 
but correlations with other measures are 
presented which seem to indicate that 
the test is not measuring the same things 
measured either by the American Council 
on Education Psychological Test or by 
the CSS final course grade. There is also 
some evidence that the relationship of 
the in-basket role to the student’s area of 
greatest experience in the Air Force is a 
factor of minor importance in influencing 


his score. The opinions expressed by stu- 
dents in a posttest inquiry indicated a 
substantial acceptance of the test ma- 
terials for purposes involving compari- 
sons of groups. 

It is concluded from analysis of the 
data obtained from the tryout that the 
pretest form of the test is suitable for 
certain administrative purposes, such as 
comparisons of groups before and after 
training, but not for purposes of indi- 
vidual assessment. It is recommended 
that steps be taken to improve its relia- 
bility before the test is widely used. Sug- 
gestions are presented for extending the 
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range of information obtained from the 
test, and for increasing its reliability 
through reassessment of scoring weights, 


improvement in the homogeneity of the 
subtests, revision of items, and changes 
in the scoring procedure. 


APPENDIX 


Several sample problems in the form of fac- 
simile letters taken from the in-basket of the 
Commanding Officer are reproduced below, The 
letters which appear in memo form, with the 
notation “Form 71W 3” at the lower left, were 
each typed on a form specially designed for 
this test. It is similar to “buck slip” forms used 
in the Air Force, but not identical to any (see 
Fig. 1). 

The first problem is presented by a memo- 
randum from the Adjutant requesting permission 
to correct his log of classified documents in an 
unauthorized manner, This problem is intended 
to measure Efficient Use of Routine. 

The second problem involves a directive to all 
pilots, prepared by the Director of Operations 
for the Commanding Officer’s signature as a re- 
sult of a complaint that pilots have been buzzing 
the campus. The best solutions recognize that 
the Director of Operation’s proposed action is 
unimaginative and that a specific positive pro- 
gram for prevention of such violations is re- 
quired. The problem is intended to measure 
Foresight. 

The third sample problem involves a com- 
plaint from a used car dealer that two airmen 
have passed a bad check. The candidate should 
realize that the investigation of the Legal Officer 
was inadequate and that the Commanding Offi- 
cer should insist on a more thorough investiga- 
tion before signing a letter such as has been pre- 
pared for him to sign. The problem is intended 
to measure Guidance of Decision-Making. 

In the fourth problem the Wing Chaplain pre- 
sents a letter, for the Commanding Officer’s sig- 
nature, requesting the Ministerial Association to 
develop increased recreational facilities. The best 
action is one which recognizes that a much 
broader study of the recreation situation is 
needed. The problem is intended to measure 
Guidance of Decision-Making. 


PROBLEM 1 
To: Commandant 
From: Adjutant 
For: Action 
Fire: C446 g July 195x 


Susyject: Correction of the Log of Classified Docu- 
ments, 


REMARKS: 


1. The Administrative Inspector, HQ, ggth 
AF, discovered a discrepancy between our Log 
of Classified Documents and the File of Classified 
Documents. 


2. The missing Document is Secret Letter MV 
163-092-21, dated 16 March 195x, from HQ AMC. 


3. The report of the Administrative Inspector 
contains no mention of the specific missing docu- 
ment, but merely mentions a discrepancy between 
the Log and the File of Classified Documents 
which must be corrected. 


4. The document described in Paragraph 2 
was burned by the undersigned after appropriate 
action on the letter had been taken. There were 
unfortunately no witnesses; the undersigned was 
new at his job and did not realize fully the im- 
portance of the action. There is no question that 
the letter was burned. 


5. Permission is requested to correct the Log 
of Classified Documents by drawing up a new 
Log which omits the listing of the document 
described in Paragraph 2, and to destroy the 
present Log of Classified Documents. 


(signed) Joun T. SmitH 
Major, USAF 
Adjutant 
Form 71W3 


PROBLEM 2 
First Letter 


EASTERN UNIVERSITY 
CAROL, MAINE 


Office of the President June 25, 195x 
Colonel Hiram W. Goodfellow 

Commanding Officer 

Pine City Air Force Base 

Pine City, Maine 


Dear Sir: 


I am writing you about a situation that has 
been causing our students, instructors, and ad- 
ministrative officers increasing concern. Our Uni- 
versity has a number of returned fliers in the 
classes. They are utilizing planes at Pine City 
AFB for flying, I suppose partly for amusement 
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and partly to earn the appropriate number of 
flying hours. 

They are, however, abusing this privilege tre- 
mendously to the detriment of the University. 
A great deal of flying is done over the campus, 
buzzing not only the University buildings, but 
also buzzing students when they are walking 
along relatively open stretches on the campus. 
The noise of the diving planes interferes with 
the lectures and recitations; furthermore, other 
pilots in the class seem to take it as a challenge 
and vow they will come closer than their buddy 
did just then. So far no one has been seriously 
injured or killed; however, it would seem that 
this would not be unlikely if present antics are 
continued. The University is doing everything 
in its power to cooperate with the military and 
to assist in the present emergency. We trust that 
you will be able to prevent a further occurrence 
of such unnecessary annoyances as I have referred 
to. 

Yours truly, 
(signed) MILTON F, JONES 
President 
Eastern University 
sbm 


Second Letter 

To: Director of Operations 
From: Commandant 

For: Action 
Fite: C747 Date: 27 June 195x 
Susject: Flying Violations 


REMARKS: Please look into this and suggest ap- 
propriate action. 


(initialed) H. W. G. 


Form 71W3 

Third Letter 

To: Commandant 

From: Director of Operations 
For: Signature 
Fire: C747. —s«zDaTE: 2 July 195x 
Susject: Flying Violations 


Remarks: I have looked into the facts alleged 
in President Jones’ letter. He has not exag- 
gerated the situation. 


(signed) W. T,. THOMPSON 
Lt. Col., USAF 


Director of Operations 
Form 71W3 
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Fourth Letter 

To: Air Operations Inspector 
From: Commandant 

Attn: All pilots 

For: Action 


Fire: C747 Date: 3 July 195x 


Supject: Flying Violations 
REMARKS: 


1. It has been brought to the attention of this 
Command that there have been repeated viola- 
tions of paragraph 13, AFR 60-16, Minimum 
Altitude of Flight, in the vicinity of the Eastern 
University Campus, Carol, Maine. 


2. Since this Command has not been informed 
of these violations under the provisions of para- 
graph 49, AFR 60-16, Authorized Deviations, it 
is assumed that these incidents are recognized 
by the pilots concerned as being unauthorized. 


3. No useful military purpose is served by such 
conduct. 


4. The attention of pilots violating cited regu- 
lations is invited to paragraph gc, AFR 36-57, 
Causes for Suspension, Serious Willful Violation 
of Flying Regulations. 


(signed) Hiram W, GoopFELLow 
Colonel, USAF 
Commander 
Form 71W3 


PROBLEM 3 
First Letter 


W. Downe EAst 
THE SMILING ESKIMO! 
Pine Plaza West 
Pine City, Maine 


PERSONAL June 28, 195x 


Colonel H. W. Goodfellow 
Commanding Officer 

Pine City Air Force Base 
Pine City, Maine 


Dear Colonel: 


Two weeks ago, two soldiers stationed at your 
base bought a used car from me for $400, The 
soldiers were Sgt. Myron Q. Jones and Sgt. 
Herbert K. Snyder. They each gave me a check 
for $203.98, drawn on the National Shawmut 
Bank of Boston, with the story that these were 
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their pay checks for May. The checks have been 
returned marked “Depositor Unknown.” They 
asked me not to report that they were out of 
uniform, although this was also true. 

The car has since been sold to Sgt. Homer H. 
Dinger, according to the State Registrar of Mo- 
tor Vehicles. Sergeants Jones and Snyder have 
been sent overseas, according to reports given 
me by soldiers still living in their quarters. 

I wish you would do something about this. 
Either Sergeant Dinger should return the car 
to me, or pay me for it. Jones and Snyder should 
be brought back to stand trial for bad check 
passing. 

Sincerely yours, 
(signed) W. Downe East 
WDE: pac 


Second Letter 
To: Legal Officer 
From: Commandant 
For: Action 
Fite C496 


Remarks: Please draft an appropriate reply to 
this letter, for my signature. 


Date: 1 July 195x 


(initialed) H. W. G. 
Form 71W3 


Third Letter 


71st COMPOSITE WING 
PINE CITY, MAINE 


C496 3 July 195x 
Mr. W. Downe East 
Pine Plaza West 


Pine City, Maine 
Dear Mr. East: 


Your letter of 28 June, criticizing the actions 
of three of the airmen of this command, has been 
referred to my legal officer for consideration, We 
are sorry to learn of your experiences with them. 

However, we are powerless to be of any as- 
sistance to you in this matter. Sgt. Dinger’s title 
to the car is evidently clear, since the State 
Registrar of Motor Vehicles has accepted it. He 
cannot be expected to pay for the same car 
twice. 

Although we did send a Sgt. Myron Q. Jones 
and a Sgt. Herbert K. Snyder overseas last week. 
there is no proof that they were not imperson- 
ated, Sgt. Dinger tells us that he conducted all 
his negotiations with them by long distance tele- 
phone the day after they left here for overseas. 
He wired them their money to Seattle, and they 
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mailed him his title. While this is a highly un- 
usual way of doing business, the story checks in 
every detail except for positive identification of 
Jones and Snyder. 


Sincerely yours, 


(for signature of) Hiram W. GoopFrELLow 
Colonel, USAF 
Commander 


PROBLEM 4 

First Letter 
To: Commandant 
From: Chaplain 
For: Signature 
Fire: C525 Date: 2 July 195x 
Supject: Community Facilities for Recreation 
REMARKS: 

1, In view of the projected expansion of this 
base, I feel that recreational facilities in Pine 


City will soon become a matter of critical con- 
cern. 


2. I have prepared the attached letter for your 
signature, I sincerely hope that you will be able 


and willing to speak to the Ministerial Associa- 
tion on this matter. Of course, if you so desire, 
I should be glad to do so in your place, but I 
sincerely believe that your personal attention to 
this matter would insure the success of this 


project. 
(signed) S. Y. SOLE 
Major, USAF 
Wing Chaplain 
Form 71W3 


Second Letter 


7ist COMPOSITE WING 
PINE CITY, MAINE 


3 July 195x 
The Rev. Mr. Simon T. Pureheart, Chairman 
Pine City Ministerial Association 

Old North Church 

Pine City, Maine 


Dear Reverend Pureheart: 


You have probably seen the announcement 
from Washington that the facilities of Pine City 
Air Force Base are to be markedly expanded. 

We have in the past enjoyed generally har- 
monious relationships with the townspeople. 
There have been numerous instances of activities 
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that have been set up in Pine City to provide 
good, wholesome recreation and diversion for 
our men during off-duty hours. However, the 
doubling of the base complement which will 
occur over the next six months poses new prob- 
lems. 

I am therefore writing to you as head of the 
Ministerial Association to ask that your group 
consider what plans the churches might make 


so as to help provide increased facilities for off- 
duty activities. I shall be glad to speak before 
your group if you should desire to invite me to 
do so. 

Sincerely yours, 


(for signature of) Hiram W. GoopFELLOw 
Colonel, USAF 
Commander 
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